"Befriending a Geocoder" by Diana Shkolnikov Live captioning by Norma Miller. @whitecoatcapxg >> I thank you all for coming, I'm Diana Shkolnikov, I work for Mapzen and I lead the geocoding team, and I'm here to talk to you guys about geocoders, and I was reading, I was reading a blog post or a Medium article recently about what to do when you're presenting because I was trying to prepare for this talk and the article said, you know, it's all about telling a story, and so I sometimes take things a little too literally so I'm going to actually read ah story that I wrote and hopefully this goes well. So the story is about a geocoder. Geocoders are mysterious systems that people kind of love and take for granted and hate when they do the wrong thing. And I think talking a little bit about the history and the details of what goes into making a geocoder will help to kind of pull back the velvet curtain and make people less afraid of these mysterious systems. And so, befriending a geocoder, written by me, illustrated by the internet. I borrowed heavily and we'll start. I'll be turning the pages and you guys can keep following along. I don't recommend you read. I'll be reading the story to you. This is for later when you want to read this to your children at bedtime, which I'm sure you all will. Our story begins a long time ago in the 1960s, when the first geographical information systems began to take shape in the far aWayland of Canada. Dr. Roger Tomlin son, a brilliant man, created a system that would catalog data pertaining to agriculture, wildlife and forestry. Look how friendly he looks in that picture, isn't that great? >> Shortly thereafter, some of the finest universities in the unfortunate S, such as Harvard and Yale followed suit and implemented their own versions of early geospatial search engines. A team of Yale graduates and students developed a protocol they called the dual independent map encoding, DIME for short. This groundbreaking protocol paved the way for geocoding algorithms still used in some of today's most popular commercial geocoders, such as Google and Mapquest. There are links below if you guys want to follow along later to read. Quick fun fact, New Haven, Connecticut, the home of Yale University was the first city on earth with a topologically integrated geocodable streets network database. So next time you're in that city, you can tell them they should be proud. All right. Turning the page. Using U.S. census data from 1970 and 1980, the first automated system to store and retrieve city address data using city blocks and house numbers. I love this picture. This was taken from the census bulletin clipping. You can see this gentleman mansplaining the work that he does with his colleague over there and she's intently listening. I thought that was awesome. I'm glad everybody knows what mansplain is. Sometimes you have to mansplain what that is. (Reading the slide very fast.) This is where it gets good, OK. This new set of data was named TIGER. ... : Raise your hand if you've heard of TIGER. All right. See, great. Raise your hand if you've seen this clip art magic before. Nice. I thought it was so great. It's like a high school t-shirt you would get, right, with the logo? Someone found the TIGER clip art. Amazing, someone should make these into t-shirts. (reading slide verbatim, very fast). If you can't tell, that's a boulder in the middle of a road. Someone said it looks like a tank earlier, so just clarifying, that's just a big rock. >> (Reading slide verbatim, very fast.) >> How many of you guys have used a geocoder in your work. Show of hands. Nice. How many need to use one in the future and are hoping to -- all right, cool. Come see us at the workshop tomorrow. I'll have more information on that at the end. >> (Reading slide verbatim, very fast.) oh, I skipped a page. Sorry. >> AUDIENCE MEMBER: The top of the page. >> The top of this page? Hold on. No, I went too far. This is why reading doesn't work. OK, I'll just read this, and the people rejoiced. (Reading very fast) so yes, it means addresses or venues. And now here. Sorry about that. (Reading very fast.) These are some of the tags. All right, appendix ... ... (reading very fast.) that's deep. Take that in. All right, another helpful training tip is to add alternate names for venues that might have them ... (reading very fast) you can see some of the McDonalds, various names, so if you were putting that into OSM, adding all the alternative names might be really helpful. So, if there were loads of venues in OSM and each of them had a name, the geocoder would be pretty obedient ... (Reading very fast). (Next slide, reading.) (Next slide, reading.) The end ... [applause] That's all the info. If you guys are interested in learning more about the Pelias geocoder, which is the Mapzen geocoding engine, come and check out all of those websites. There is blog posts about what it takes to make a geocoder if you want more details to read to your children, obviously. The documentation for the API is on the docs page and the GitHub account, everything we do is open source and so the Pelias engine is entire open source, you can come check it out. It's written in Node.js. Those are the only two dependencies we have. We pride ourselves in having very few of those and making our stuff easy to install and easy to contribute to and we would love to have all of your contributions, which also brings me to the point that all of the contributions that the OpenStreetMap editors are making are going a long way into helping us even make this all possible. So thank you all for contributing the data that makes this work. Also tomorrow, myself and two of my coworkers will be presenting a workshop on actually making a leaflet map with a search box and then customizing that search using a Pelias and maps and search engine. So I hope you guys can come out to that. And check it out. Any questions? I think we have a lot of time for questions. I kind of rushed through the reading. Anything? Yes. AUDIENCE MEMBER: Can you remind me again what powers it? Does open address power it or it just OpenStreetMap addresses or is it OpenAddresses, as well? >> Sure, yeah, so for the geocoding purposes we do need additional addresses as I mentioned. OpenStreetMap isn't amazing for that. So we do pull in a different dataset for that. It's OpenAddresses, it's an aggregation site for government agencies that have published their data through our GIS servers or otherwise, some of it is just zip files and we pull that in into OpenStreetMap and then we pull in geonames and that one has relatively rich venue data, as well, so we end up put pulling that in and then we use who's on first which is Mapzen gazetteer. It used to be Quattro shapes which is something that wasn't supported anymore. So those are the four datasets that we use and the import pipeline is easy enough to augment if you wanted to write a custom importer for an additional dataset, so if anyone knows of additional data, let us know and we'll write those importers or you're welcome to contribute by writing those on your own. Any other questions? All right. Thank you. Oh, -- AUDIENCE MEMBER: The sequel is. >> >> The sequel? I don't know. I'm have to get back to you, I'll have to think of a good plot for the next one. >> AUDIENCE MEMBER: How do you cover nonUS companies? >> That's a great question. It varies by country so I can't really speak to it but because we use OpenAddresses and OpenStreetMap we kind of get the best of both. In some ways OpenAddresses has entire countries open. Germany and some other countries come to mind. That's been really exciting to us it means we have rooftop coverage for addresses in those areas, and with OpenStreetMap as you know, it's a mixed bag depending on the area. So most of the populated areas are covered very well and some are not. Yes? >> AUDIENCE MEMBER: And multilingual addresses? How do you do that? >> It's coming. Right now we are very focused on the English use case. We do support, you know, countries where the address is in a different format. We're actually incorporating right now a module called lib postal. I don't know if you guys have heard of it, but it's a machine learning algorithm that parses addresses internationally, because address parsing is actually a very difficult problem to solve, considering that every country has a unique format and all the languages and taking into consideration diacriticals, and so that should be coming out in a month or so on the production server. It's called lib postal. I can add that to my slide and when you guys get them later you can have access to that the multilingual thing, that's hard to solve. Alt names for names for places in countries and places and cities, OSM is good about that in a lot of cases, which is really helpful. Who's on first is helping in that, as well. Any other questions? All right. thank you!